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Abstract 


Texture  has  long  been  recognized  in  computer  vision  as  an  important  monocular 
shape  cue,  with  texture  gradients  yielding  information  on  surface  orientation.  A 
more  recent  trend  is  the  analysis  of  images  in  terms  of  local  spatial  frequencies, 
where  each  pixel  has  associated  with  it  its  own  spatial  frequency  distribution. 
This  has  proven  to  be  a  successful  method  of  reasoning  about  and  exploiting 
many  imaging  phenomena.  Thinking  about  both  shape-from-texture  and  local 
spatial  frequency,  it  seems  that  texture  gradients  would  cause  systematic  changes 
in  local  frequency,  and  that  these  changes  could  be  analyzed  to  extract  shape 
information.  However,  there  does  not  yet  exist  a  theory  that  connects  texture, 
shape,  and  the  detailed  behavior  of  local  spatial  frequency.  We  show  in  this  paper 
how  local  spatial  frequency  is  related  to  the  surface  normal  of  a  textured  surface. 
We  find  that  the  Fourier  power  spectra  of  any  two  similarly  textured  patches  on  a 
plane  are  approximately  related  to  each  other  by  an  affine  transformation.  The 
transformation  parameters  are  a  function  of  the  plane's  surface  normal.  We  use 
this  relationship  as  the  basis  of  a  new  algorithm  for  finding  surface  normals  of 
textured  shapes  using  the  spectrogram,  which  is  one  type  of  local  spatial  fre¬ 
quency  representation.  We  validate  the  relationship  by  testing  the  algorithm  on 
real  textures.  By  analyzing  shape  and  texture  in  terms  of  the  local  spatial  fre¬ 
quency  representation,  we  can  exploit  the  advantages  of  the  representation  for 
the  shape-from-texture  problem.  Specifically,  our  algorithm  requires  no  feature 
detection  and  can  give  correct  results  even  when  the  texture  is  aliased. 


1.  Introduccion 


Texture  has  long  been  considered  an  imponant  shape  cue  in  monocular  images,  starting  with 
observations  in  biological  vision  by  GibsonfM]  in  1950.  The  corresponding  algorithms 
developed  in  computational  vision  exploit  the  systematic  changes  in  a  projected  texture’s 
appearance  to  find  the  surface  normal  of  the  underlying  shape.  This  effect  is  illustrated  in 
Figure  1,  which  shows  a  Brodatz[7]  cotton  canvas  texture  synthetically  mapped  onto  a  plate. 
The  angle  and  changing  depth  of  the  plate  combine  to  make  the  texture  appear  “smaller”  as 
the  plate  recedes.  A  more  recent  trend  in  image  understanding,  also  with  roots  in  biological 
vision,  is  local  spatial  frequency  analysis.  Here,  the  image  is  represented  in  terms  of  the 
local  spatial  frequencies  at  every  pixel  --  the  “space/frequency  representation”.  Coherence 
and  changes  in  local  spatial  frequency  from  point  to  point  can  be  used  to  understand  a  rich 
set  of  image  phenomena  that  cannot  be  analyzed  easily  in  the  space  or  frequency  domain 
alone[20|.  Since  texture  is  fundamentally  a  frequency  phenomenon,  and  since  .>Iiupe  is  fun¬ 
damentally  a  spatial  phenomenon,  it  is  natural  to  approach  the  shape-from-texture  problem 
in  terms  of  this  representation.  In  Figure  1,  for  example,  we  show  the  local  Fourier  power 
spectrum  (spectrogram)  in  two  places  on  the  image.  The  frequencies  on  the  right  are  higher 
than  those  on  the  left,  due  to  perspective  and  foreshortening.  However,  there  does  not  exist  a 
theory  that  relates  texture,  shape,  and  the  detailed  behavior  of  local  spatial  frequency.  In  this 
paper,  we  develop  a  theory  that  predicts  the  systematic  frequency  shifts  due  to  shape  and  use 
the  theory  in  a  new  shape-from-texture  algorithm  based  on  the  spectrogram.  This  has  proven 
to  be  a  simple  and  intuitive  approach  to  the  problem.  The  method  is  attractive  because  it 
exploits  a  representation  that  is  useful  for  understanding  other  important  image  phenomena 
as  well. 


Figure  1:  A  textured  plate  with  part  of  its  spectrogram  superimposed 


1.1.  The  Space/Frequency  Representation 


The  space/frequency  representation  shows  the  frequencies  of  a  signal  at  every  point  in  the 
signal.  Figure  2  shows  an  example.  The  one-dimensional  function  of  x  consists  of  a  low-fre¬ 
quency  sinusoid  with  a  higher-frequency  sinusoid  replacing  the  middle.  The  space/fre¬ 
quency  representation,  shown  on  the  right,  is  necessarily  a  two-dimensional  function  of  x 
and  u,  since  it  must  s  :)w  a  one-dimensional  frequency  distribution  for  every  point  in  the 
signal.  The  frequencies  u  are  shown  along  the  venical  axis.  It  is  like  having  a  little  Fourier 
transform  plotted  vertically  at  every  point  along  the  x  axis.  If  the  original  signal  were  a  two- 
dimensional  function  of  x  and  y  (an  image),  then  the  space/frequency  representation  would 
be  a  four-dimensional  function  of  x  and  y  and  the  two  frequencies,  u  and  v. 


Figure  2:  A  signal  and  its  space/frequency  representation 


The  space/frequency  representation  shown  in  Figure  2  is  ideal,  and  cannot  be  computed  by 
any  commonly  u.sed  techniques.  We  use  the  image  spectrogram  as  our  instantiation  of  the 
representation.  For  each  point  in  the  image,  we  extract  a  square  neighborhood  of  surround¬ 
ing  pixels  and  multiply  this  block  of  intensities  by  a  window  function  that  falls  off  at  block's 
edges.  We  compute  the  two-dimensional  Fourier  transform  of  this  product  and  take  the 
squared  magnitude  as  the  local  frequency  representation,  giving  the  local  power  spectrum. 
This  is  the  image  spectrogram  5(.c,  y.  u,  v),  defined  as 


5(.r,  y,  w.  v) 


j  I  \v(x\  y')f{x'  -  X,  y'  -  y)e  ^  ^  dx' dy 


(1) 


^  —  00  —  00  I 

where  f(x,y)  is  the  image  and  wf  c, y)  is  the  window  function.  This  is  what  we  used  to 
compute  the  two  light-colored  blocks  in  Figure  1. 


There  are  several  other  methods  of  computing  the  space/frequency  representation.  The  well- 
known  ones  are  Gabor  functions!  12),  the  Wigner  distribution|9],  and  wavelets|21  j.  We 
chose  the  spectrogram  because  it  gives  an  intuitive-looking  picture,  provides  a  dense  sam¬ 
pling  in  space  and  frequency,  and  comes  with  the  well-developed  theory  of  Fourier  trans¬ 
forms.  The  method  of  computing  the  representation  is  really  only  important  at  the 
algorithmic  level  of  our  development.  The  basic  theory  of  projecting  frequencies  applies 
regardless  of  the  particular  representation. 


1.2.  Shape  from  Texture 


Notable  work  in  shape-from-texture  includes  that  done  by  Witkin[27],  Blostein  and 
Ahuja[51,  Aloimonos[l],  Bajcsy  and  Liebennan[3],  Kender[191,  Stevens[251,  Kanatani  and 
Chou[181,  and  Blake  and  Marinos[41.  When  Gibson  first  speculated  that  humans  could  infer 
surface  normals  based  on  texture  gradients,  he  assumed  that  the  frontally  viewed  version  of 
the  texture  had  constant  texture  density.  Most  computational  shape-from-texture  work  fol¬ 
lows  the  same  paradigm:  assume  that  a  certain  parameter  is  uniform  when  the  texture  is 
viewed  frontally,  model  the  deformation  of  this  parameter  due  to  the  shape  of  the  textured 
object  and  camera  projection,  and  then  compute  the  surface  normal  of  the  shape  by  measur¬ 
ing  the  change  of  the  parameter  in  the  image.  In  our  development,  we  assume  the  frequen¬ 
cies  of  the  frontally  viewed  texture  remain  the  same  from  point  to  point  -  i.e.  that  the 
frontally  viewed  texture  is  stationary.  The  changes  in  local  spatial  frequencies  on  the  pro¬ 
jected  image  then  give  information  about  the  shape  of  the  surface.  The  resulting  algorithm 
works  directly  on  the  spectrogram  of  the  image,  requiring  no  feature  detection.  This  is  an 
important  advantage  over  many  other  shape-from-texture  algorithms,  as  it  is  very  difficult  to 
reliably  find  texels  in  an  image.  Blake  and  Marinos  said  in  1990: 

Our  greatest  practical  problems  arise  from  isolating  indepen¬ 
dent  oriented  elements  from  an  [texture]  image. [4] 

And  Aloimonos  said  in  1988: 

There  is  no  known  algorithm  that  can  successfully  detect  tex¬ 
els  from  a  natural  image.  [  1  j 

Thus  it  makes  sense  to  develop  an  algorithm  that  requires  no  feature  detection.  Furthermore, 
our  algorithm  does  not  even  depend  on  weak  texture  features  such  as  edges.  Instead  we 
work  with  a  dense  representation  of  local  spatial  frequency,  allowing  us  to  exploit  all  the 
useful  data  in  an  image  patch.  And  by  keeping  a  dense  representation  of  the  data,  we  can 
apply  basic  theory  all  through  the  algorithm,  allowing  us  to  easily  account  for  complicated 
phenomena  like  aliasing. 

Local  spatial  frequency  analysis  of  texture  started  with  descriptions  and  segmentation  of 
frontally  viewed  textures.  Such  work  includes  the  use  of  the  Fourier  transform  by  Bajcsy]  2], 
GramenopoulosjlS]  and  Matsuyama  et  al.\22],  Gabor  filters  by  Turner] 26],  Fogel  and 
Sagi]  1 1  ],  and  Bovik  et  al.\6\  and  the  Wigner  distribution  by  Reed  and  Wechslerl24]. 
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Starting  with  Bajcsy  and  Lieberman[3],  one  branch  of  shape-from-texture  research  has 
focused  on  using  local  spatial  frequencies  for  the  problem.  They  studied  qualitative  and 
quantitative  aspects  of  windowed  power  spectra  of  images  with  receding  ground  planes. 
They  tracked  the  peak  frequencies  from  window  to  window,  showing  how  the  gradient  of 
these  frequencies  qualitatively  matched  the  texture  gradient.  They  stopped  .short  of  actually 
computing  surface  orientation.  Brown  and  Shvaytser[81  use  the  autocorrelation  of  an  entire 
texture  image  to  determine  the  slant  and  tilt  of  the  textured  surface.  Although  this  is  not 
explicitly  a  spatial  frequency  technique,  it  is  close,  because  the  autocorrelation  is  the  Fourier 
transform  of  the  power  spectrum.  Jau  and  Chin[171  use  the  Wigner  distribution  and  report 
good  results  by  examining  only  a  scalar  measure  of  the  high  spatial  frequencies.  These  last 
two  efforts  both  report  good  results.  Instead  of  examining  aggregate  frequency  characteris¬ 
tics,  our  formulation  allows  us  to  exploit  the  shift  of  each  frequency  component  from  point 
to  point  in  the  projected  texture.  This  means  we  can  take  full  advantage  of  the  space/fre¬ 
quency  representation  and  account  for  other  effects  like  aliasing. 

2.  Math 

This  section  contains  a  derivation  of  the  connection  between  the  surface  normal  of  a  tex¬ 
tured  surface  and  the  local  Fourier  transform  of  the  projected  texture  in  an  image.  This  is 
important  because  it  relates  a  physical  characteristic  of  a  3D  scene  to  the  measurable  behav¬ 
ior  of  the  projected  frequencies  in  an  image.  We  show  how  the  local  spatial  frequencies  in 
the  image  are  approximately  related  by  an  affine  transformation  to  the  frontal  texture's  fre¬ 
quencies.  The  affine  parameters  are  functions  of  known  camera  parameters  and  the 
unknown  depth  and  surface  normal  of  the  texture.  From  this  we  show  that  the  frequencies  of 
two  image  patches  are  also  related  by  an  affine  transform.  If  we  assume  the  two  patches 
come  from  the  same  plane,  then  the  depth  variable  drops  out,  leaving  the  surface  normal  as 
the  only  unknown.  We  exploit  this  fact  in  our  shape-from-texture  algorithm  in  Section  3. 

2.1.  Coordinate  Sy.stems 

Figure  3  shows  the  coordinate  systems  used  in  the  derivation.  The  camera’s  pinhole  is  at  the 
origin  of  the  (X,  F,  Z)  frame.  This  serves  as  the  world  coordinate  system,  and  points 
defined  in  it  will  be  referred  to  with  upper-case  (X,  F,  Z) .  The  -Z  axis  is  coincident  with 
the  camera's  optical  axis  and  points  into  the  scene  being  imaged.  The  image  plane  is  the 
{x,  y)  frame  with  its  origin  on  the  optical  axis  at  a  distance  d  behind  the  pinhole. 
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t  textured  surface 


Figure  3:  Coordinate  systems  used  in  derivation 

We  imagine  that  each  point  on  the  locally  planar  textured  surface  has  its  own  coordinate 
frame  (i,  t,  n) ,  with  the  n  axis  coincident  with  the  surface  normal.  The  surface  normal  is 
defined  with  the  gradient  space  variables  (p,  q) ,  thus  the  unit  vector  along  the  n  axis  is 
1  ^ 

h  =  j(p,q,l),  with  r  =  qp~  +  q~+  1,  in  the  world  frame.  The  origin  of  this  surface 
frame  is  {AX,  AY,  AZ)  with  respect  to  the  world  frame. 


The  4x4  homogeneous  transformation  matrix  that  locates  and  orients  the  surface  frame  with 
respect  to  the  world  frame  is 
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This  was  derived  by  making  a  single  rotation  of  the  (s,  t,  n)  frame  around  the  unit  vector 


2  1 

(-q,  p,0)  /  (qp  +q  )  by  an  angle  4)  with  cosd)  =  -  and  sintj)  = 


’JP" 
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2.2.  Projected  Texture 


This  subsection  concludes  with  an  expression  for  a  perspectively  projected  texture.  We 
begin  by  assuming  the  texture  on  the  surface  is  “painted”  on  and  not  a  relief  pattern.  It  is 
locally  characterized  in  the  (s,  t,  n)  surface  frame  as  a  pattern  of  surface  markings  given  by 
f{s,  t).  Points  on  this  locally  planar  surface  are  given  by  coordinates  (5,  t,  0) .  Applying  the 
transformation  matrix,  the  corresponding  world  coordinates  are 

X  =  r^j5  +  rj2t  +  AA’ 

Y  =  ^21^  +  hl^  +  (3) 

Z  =  t^iS  +  +  AZ 

Under  perspective,  these  points  project  to  the  image  plane  at 


X  = 


y  = 


rjji  +  rpr  + AX 

^21^  +  ^22^  + 

+  ^2^)/  +•  AT 
^31'^  +  ^32^ 


(4) 


The  origin  of  the  (s,[,n)  frame  thus  projects  to  (Xq,>'q)  =  on  the  image 

plane.  In  order  to  avoid  carrying  a  coordinate  offset  through  the  calculations,  we  define 
another  coordinate  system,  (.U,  y’) ,  on  the  image  plane  that  is  centered  at  with  its 

axes  parallel  to  those  of  the  image  plane.  Given  an  (jc,  y)  on  the  surface, 


x'  =  x-Xq  =  -d 

y  =  y  -  yg  =  -  d 


^31 

hi 

^31 


s  +  t  j2^  4"  AX 
5  +  ^32^  4-  AZ 
5  +  r22^  +  ^^ 

s  +  ^32^  4-  AZ 


-^0 


->'0 


(5) 


Solving  these  two  equations  for  (5,  t)  will  give  equations  that  give  a  point  in  the  surface 
frame  for  any  corresponding  point  in  the  (X,y')  frame.  Doing  so,  using 
.VoAZ  y^AZ 

(AX,  AT)  =  ( - - —)  and  the  orthonormality  relationships  among  the  vectors  in 

the  transformation  matrix,  we  have 
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(6) 


^^AZ[d{yt^2  '^'^22^  ^32  ^^^0 

d [ ^13 (x'  +  xq)  + 1^2 iy'  +  >’o)  “ 

^  ^  AZ[J(yf^^-x'f^^)+r3^(y'Xo-x'yo)] 

^  [  ^13  (-Y'  +  -^o)  +  ^32  (>'■  +  >”0)  " 

Thus,  if  the  brightness  pattern  on  a  locally  planar  patch  on  a  textured  surface  is  fis,  t) ,  then 
the  projected  pattern  on  the  image  plane  is  a  nonlinear  warping  of  the  pattern  given  by 
/■(i-(x',y),  t{x',y')). 

2.3.  Approximating  the  Fourier  Transform 

In  order  to  work  with  frequencies,  we  would  like  to  find  an  expression  for  the  Fourier  trans¬ 
form  of  the  projected  texture,  f(s{x',  y'),  r(x’,  >•’)).  But  the  warpings  represented  by  Equation 
(6)  are  too  complex  to  allow  us  to  say  anything  general.  We  can  make  progress  by  lineariz¬ 
ing  s{x’,y')  and  t{x',y')  using  a  truncated  Taylor  series  around  (x’.y’)  =  (0.0).  The 
approximation  is  justified  since  we  are  only  examining  a  relatively  small  window  of  intensi¬ 
ties  around  the  point  of  interest.  We  have 

5(x',  y’)  =  i  t'  -I-  i- y’ 

'  (7) 

t(x\  y )  =  t^x'  t^y' 

with 


ix'.y')  =  (0.0) 


^  ^0.0) 


(0.  0) 


AZ\d{rp^  +  q"-)  -  qy^  (p-  -h  q")  ] 

-  —  ~ 

dip^'  +  q  )  (px^^  +  qy^-d) 

AZ  [ dpq  ( r  -  1 )  -I-  qy^^  {p~  +  q~)\ 

-  ^ 

d{p‘'  +  q^)  {pxQ  +  qVf^  -  d) 

AZ\dpq{r-  1)  +pyQ{p"  +  q~)  ] 

-  ~ 

dip^  +  q~)  {px^  +  qy^-d) 
AZ\d{p^  +  rq^)  -  pyQ(p~  +  q~)  1 
d(p~  +  q^)  {px^  +  qy^^-d) 


(8) 


7 


where  we  have  substituted  the  values  of  t-j  from  Equation  (2). 

The  projected  version  of  f{s,  t)  is  then  approximately  fis^'  +  Syy',  t^x'  +  t^y'),  which  is  just 

an  affine  transformation  (without  translation)  of  the  coordinates.  A  similar  relationship 
holds  in  the  Fourier  domain  given  by  the  following  Fourier  transform  paiis[l31: 


fix\y')^F(u,  V) 


fis^'  +s^y\t^'  +  t^y') 


1  , 


(W 


where  D  =  s  t  -sj^.  Here  (it,  v)  are  spatial  frequency  coordinates  in  cycles/unit  dis- 
^  y  y  ^ 

tance,  an  upper-case  function  refers  to  the  Fourier  transform  of  the  corresponding  lower¬ 
case  function,  and  the  Fourier  transform  is  defined  as 


F{u,v)=  I  J/(jc’,y’)e  (10) 

—  00  —  00 

The  significant  conclusion  is  that  the  Fourier  transform  of  a  perspectively  projected  texture 
patch  is  approximately  an  affine  transformation  of  the  Fourier  transform  of  the  frontally 
viewed  texture.  The  affine  transformation  parameters  are  given  by  the  camera  focal  length, 
the  pixel  coordinates  of  the  point  of  interest,  and  the  depth  and  orientation  of  the  patch. 

2.4.  Relation  Between  Fourier  Transforms  of  Two  Patches 


Since  there  is  usually  no  way  to  determine  what  the  frontally  viewed  texture  looks  like,  we 
resort  to  comparing  patches  of  the  same  texture  at  different  locations  in  the  image.  We 
showed  above  that  the  Fourier  transform  of  each  patch  is  related  to  the  Fourier  transform  of 
the  frontally  viewed  texture  by  an  affine  transformation.  This  means  that  the  Fourier  trans¬ 
forms  of  patches  themselves  are  related  by  affine  transformations.  We  will  show  that  if  we 
assume  two  patches  come  from  the  same  plane,  then  the  affine  parameters  connecting  them 
are  functions  of  known  parameters  and  the  plane’s  surface  normal. 

Suppose  the  two  patches  /'j(j,  t)  and  f^is,  t)  are  related  to  the  frontally  viewed  texture  by  the 
affine  parameters  s  ,,  t  ,,  and  t  .  In  Fourier  space,  an  affine 

transformation  of  the  first  into  the  second  means  that 


I 

lA 


— U - V  - M  4-  -  V  Lcj,  - u-  - V  4-  - -  V  )  =  - /*  - U - V,  -  M  +  - v)  (11) 

'  D,  j  ’(o,  o,  •  O,  J  IDjI  D,  Oj 


8 


where  F^(u,v)  and  F2(m,  v)  are  the  Fourier  transforms  of  the  two  patches, 

Di  =  =  ^x2^yl~ ^yl^xl'  affine  transfor¬ 

mation  parameters  connecting  the  two  Fourier  transforms.  Note  that  we  have  ignored  phase 
differences  here.  In  reality,  the  Fourier  phases  of  the  two  patches  will  be  different.  This  dif¬ 
ference  is  masked  because  each  patch  is  defined  with  respect  to  its  own  local  coordinate  sys¬ 
tem.  In  our  formulation,  phase  would  only  complicate  the  derivation,  since  we  discard  it  by 
computing  the  Fourier  transform’s  magnitude  in  our  algorithm. 

Equating  coefficients  on  (w,  v)  in  Equation  (11)  leads  to  the  following  linear  equation. 


1 

■  c 

1 

’< 

r  1 

1 _ 

1 

_  1 

^x2 

0  0  1  .v^ , 

Cl^ 

^2 

—s  •> 
v2 

0  0 

/?2 

2'^2_ 

whose  solution  is 


^K\^y2~  ^y\Kxl 

^1 

1 

^x\^y2~  ^x2^y\ 

^2 

Di 

^x\^y2~  ^y\^xl 

^2 

5r2^yl  “'^'v2^.rl 

Thus,  the  affine  parameters  connecting  the  two  Fourier  transforms  are  functions  of  the  affine 
parameters  connecting  the  two  patches  to  the  frontally  viewed  texture.  In  order  to  relate  this 
equation  to  the  physical  parameters  of  the  camera  and  the  textured  surface,  we  take  the  val¬ 
ues  of  i'yp  ^vl  ^  ^''x2"'v2’ *^^^*^*^  Equation  (8).  Before  doing  this, 

however,  we  will  make  the  assumption  that  the  two  texture  patches  have  the  same  surface 
normal,  i.e.  =  (P2’^2^  ”  (p, </),  and  that  both  patches  are  on  the  same  plane. 

i.e. 


d-pxf^-qy^ 

=  -i -  04) 

AZi  d~px^^-qy^^^^ 

Substituting  values  from  Equation  (8),  the  affine  parameters  connecting  the  Fourier  trans¬ 
forms  of  the  two  patches  are  then 
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(15) 


a;  =  r)  (p^  +  q^)  +dr  ip^x^+ p^qyg+ pq^x^^  +  q^y^)  +  dpq  (qAx^- pS^y^)  -  pq  {p^  +  q^)  (-to,>Oj  “  ^Oj^O,)] 

*1  =  qA^(-drp)  ipAXf^^qAy^)  -  dq  (qi^x^- pAyg)  +q(,p^  +  q^)  (*o,>Oj  “ 

^2  =  pA^(,drq)  {pAXg  +  qAy^)  -  dp  (qAXg- pAyg)  +p{p^  +  q^)  (•>Co,>Oj  ~'‘Oj>o,l] 

^2  =  (p^  +  q^)  +dr(p^Xg^  ^o^qyg^+pq^Xg^+q'yg)  -  dpq  (qAXg- pAyg)  +  pq  (p^  +  q^)  (-to^yo^  “  ^o^yo,)] 

where 

,+^3^0 

^  =  - 2 - T - ^ - 2 

dr  ip  +  q  )  (pXq^  +  qy^^  -d)~ 

/~7  2  T  •  (16) 

r  =  dp  +q  +  ^ 

■^0  -'^()|"'^02 

^>^0=  >'(>,->'02 

These  equations  are  not  easy  to  interpret  intuitively.  The  notable  feature  is  that  the  only 
unknow  1  are  (p,  q) .  This  allows  us  to  use  a  simple  algorithm  that  determines  the  correct 
surface  normal  by  finding  which  (p,  q)  generates  the  affine  parameters  that  best  transform 
one  patch  into  another.  In  our  algorithm  we  actually  use  the  squared  magnitude  of  the  Fou¬ 
rier  transform,  but  the  same  affine  parameters  apply. 

To  summarize  this  section,  we  first  showed  how  a  locally  planar  surface  patch  projects  by 
perspective  into  the  image.  Since  this  projection  is  complicated,  we  approximated  it  with  a 
truncated  Taylor  series.  This  gave  an  affine  relationship  between  the  frontally  viewed  tex¬ 
ture  and  the  projected  texture.  A  property  of  the  Fourier  transform  says  that  an  affine  trans¬ 
formation  in  space  is  an  affine  transformation  in  frequency.  Since  the  Fourier  transform  of 
each  image  patch  is  related  by  an  affine  transformation  to  the  Fourier  transform  of  the  fron¬ 
tally  viewed  texture,  the  Fourier  transforms  of  the  image  patches  are  also  related  by  an  affine 
transformation.  If  we  assume  the  two  patches  are  on  the  same  plane,  the  affine  parameters 
that  connect  their  Fourier  transforms  are  functions  of  known  camera  parameters  and  the 
unknown  surface  normal. 

3.  Algorithm 

Flere  we  discuss  our  core  shape-from-texture  algorithm  using  the  plate  in  Figure  1  as  an 
example.  The  five  major  steps  involved  in  computing  a  surface  normal  from  an  image  of  a 
tex.'Ted  surface  are 

1 .  Pick  two  test  points  on  the  surface  that  have  the  same  texture  when  viewed 
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frontally. 


2.  Multiply  the  neighborhood  of  each  point  by  a  window  function. 

3.  Compute  the  2D  Fourier  transform  of  each  windowed  patch. 

4.  Compute  the  squared  magnitude  of  each  Fourier  transform,  giving  the  local 
power  specdum  at  each  point  (part  of  the  spectrogram). 

5.  Search  for  the  (j),q)  that  gives  the  best  affine  warping  from  one  local  power 
spectrum  to  the  other. 

We  will  consider  each  of  these  general  steps  in  this  section,  and  then  show  results  in  the  next 
section. 

Step  1  requires  that  we  find  pairs  of  test  points  on  the  same  textured  surface.  In  the  future  we 
hope  to  integrate  our  algorithm  with  a  segmentation  scheme.  For  now,  however,  the  choice 
of  points  must  be  done  manually.  Even  if  the  test  points  are  known  to  be  on  the  same  tex¬ 
tured  surface,  their  relative  location  is  imponant.  In  some  situations,  the  frequency  differ¬ 
ences  on  a  slanted  plate  will  be  too  small  to  accurately  determine  the  surface  orientation.  For 
instance,  consider  a  plate  rotated  slightly  around  a  vertical  axis.  Any  two  points  in  the  same 
column  will  show  hardly  any  frequency  shift,  and  the  algorithm  will  not  give  the  correct 
solution.  There  remains  work  to  be  done  on  assessing  the  sensitivity  of  this  method  to  the 
relative  location  of  test  points. 

In  choosing  a  window  for  step  2,  one  must  choose  a  shape  and  size.  There  are  many  differ¬ 
ent  shapes  of  windows,  and  Numerical  Recipes[23]  puts  the  choice  into  perspective: 

There  is  a  lot  of  perhaps  unnecessary  lore  about  the  choice  of 
a  window  function,  and  practically  every  function  which  rises 
from  zero  to  a  peak  and  then  falls  again  has  been  named  after 
someone. ..However,  at  the  level  of  this  book,  there  is  effec¬ 
tively  no  difference  between  any  of  these  (or  similar)  window 
functions. 

The  window  function  we  use  happens  to  be  named  after  two  people:  the  “Blackman-Harris 
minimum  4'Sample”  window[161[101.  In  two  dimensions,  its  equation  is 

27t  471  671 

w(l)  -  Wq- vvjCos  (  — /)  -t-W2Cos(  — /)  -w^cosC  — /)  (17) 

r~2  ^ 

where  L  is  the  radius  of  the  window,  0<1<L,  and  /  =  >Jx  +y''.  The  coefficients  are 
(wq,  W|,  w-,,  w^)  =  (0.35875,0.48829,0.14128,0.01168).  This  function  is  plotted  in 
Figure  4. 


Figure  4:  Blackman-Harris  minimum  4-sample  window  function 

The  choice  of  the  window  size  is  much  more  important  than  its  exact  shape.  A  smaller  win¬ 
dow  has  a  smaller  chance  of  overlapping  two  different  texture  regions  in  the  image,  which 
would  violate  one  of  our  assumptions.  The  frequency  of  the  underlying  texture  would  also 
change  less  over  the  extent  of  a  smaller  window.  If  the  frequencies  change  a  lot,  the  result¬ 
ing  Fourier  transform  tends  to  be  smeared.  On  the  other  hand,  smaller  windows  tend  to  pro¬ 
duce  more  smearing  than  larger  windows  even  when  the  underlying  function  is  stationary  or 
close  to  stationary.  This  makes  a  larger  window  attractive.  In  our  experiments,  we  have  set¬ 
tled  on  a  window  size  of  63x63  pixels  in  images  that  are  typically  512x512  pixels.  In  Figure 
1 ,  the  size  of  the  two  light-colored  squares  is  equal  to  the  window  size. 

One  alternative  is  to  use  the  “variable  window  spectrogram”  which  we  investigated  in[20|. 
In  this  scheme,  the  window  size  varies  with  the  spatial  frequency.  One  reasonable  choice  is 
to  have  the  window  size  be  some  factor  (e.g.  5)  times  the  corresponding  wavelength  of  the 
frequency,  which  means  that  we  examine  the  same  number  of  wavelengths  at  every  fre¬ 
quency.  This  is  closer  to  the  idea  of  using  wavelets  and  Gabor  functions  for  computing  the 
space/frequency  representation.  The  Wigner  distribution  has  the  same  window  dilemma  as 
the  spectrogram.  In  this  work,  we  use  a  constant  sized  window  to  make  the  Fourier  trans¬ 
form  computation  more  efficient.  We  can  justify  it  physically  by  noting  that  the  high  fre¬ 
quencies  we  see  in  textures  are  usually  the  higher  harmonics  of  the  fundamental  texture 
frequency,  meaning  that  their  extent  is  the  same  as  that  of  the  lower  frequencies. 

The  application  of  a  window  is  also  affected  by  the  randomness  of  the  texture.  Theoretically, 
our  method  should  work  for  both  periodic  and  random  textures.  However,  when  we  applied 
it  to  a  simulated  slanted  plate  with  a  random  fractal  texture  on  it,  we  found  the  spectrogram 
was  too  noisy  for  our  algorithm.  This  could  be  solved  by  averaging  the  power  spectra  from  a 
neighborhood  before  doing  any  further  computation.  However,  this  involves  using  more 
data,  which  has  the  same  disadvantages  as  using  a  large  window.  We  plan  to  investigate  this 
further. 


For  computing  the  Fourier  transforms  in  step  3,  we  use  a  2D  FFT  routine  from  the  IMSL 
math  library.  It  can  handle  arrays  whose  size  is  not  necessarily  an  integer  power  of  2.  Before 
we  window  the  image  intensities,  we  subtract  the  mean  intensity  value  in  the  neighborhood 
to  eliminate  the  d.c.  peak  in  the  Fourier  transform. 

We  next  compute  the  squared  magnitude  (power  spectrum)  of  the  Fourier  transform.  This  is 
shown  in  the  two  lighter-colored  squares  in  Figure  1.  In  using  only  the  squared  magnitude, 
we  are  ignoring  phase  information.  Phase  could  be  useful  for  periodic  textures  in  a  light¬ 
stripping-like  algorithm.  However,  the  phase  information  in  a  random  texture  would  be  use¬ 
less.  In  addition,  if  part  of  a  texture  is  occluded  as  in  Figure  5,  the  phase  information  would 
be  misleading,  because  the  number  of  wavelength  traversed  by  the  texture  in  the  occluded 
region  is  unknown. 


Figure  5:  Phase  information  would  be  misleading  in  this  case 


Because  of  varying  phase,  the  Fourier  transforms  at  any  two  general  points  even  on  a  fron¬ 
tally  viewed  texture  would  be  different.  For  the  same  reason,  strictly  speaking.  Equation 
(11)  would  not  hold.  In  order  to  match  the  phases  of  two  patches,  we  would  have  to  use  a 
six-parameter  affine  transformation  (including  translation)  rather  than  the  four-parameter 
version  (no  translation)  that  we  use  now.  By  ignoring  phase,  we  can  reduce  the  complexity 
of  the  affine  transformation  and  speed  up  the  program. 
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The  last  step  of  our  core  algorithm  is  an  exhaustive  search  for  the  (p,  q)  that  best  trans¬ 
forms  the  power  spectrum  of  one  patch  into  another.  Our  current  implementation  searches 
over  a  61x61  grid,  with  (-2,  -2)  <  (p,  q)  <  (2,  2) .  This  corresponds  to  a  maximum  slant 

of  about  63*^.  Given  a  {p,  q)  to  try,  we  compute  the  corresponding  affine  parameters  from 
Equation  (15),  use  these  to  transform  the  power  spectrum  of  the  first  patch  using  bilinear 
interpolation,  and  compute  the  sum  of  squared  differences  (ssd)  between  the  two  power 
spectra.  We  take  the  (p,  q)  that  generates  the  minimum  ssd  as  the  solution.  The  ssd  surface 

from  the  data  in  Figure  1  is  shown  in  Figure  6,  where  we  have  scaled  so  the  minimum  ssd  is 
one 


Figure  6:  SSD  surface  and  contour  plots  from  comparing  patches  in  Figure  1 


This  algorithm  is  better  than  other  shape-from-texture  algorithms  in  several  ways.  It 
requires  no  feature-finding,  which  is  normally  an  unreliable  step.  We  make  no  strong 
assumptions  about  the  frontally-viewed  texture,  only  that  it  is  stationary.  Specifically,  we  do 
not  require  that  the  texture  be  isotropic.  Theoretically,  the  method  should  work  for  both  peri¬ 
odic  and  random  textures.  We  will  have  to  find  a  better  spectral  power  estimator  before  we 
can  make  it  work  on  random  textures,  however.  Finally,  by  formulating  and  solving  the 
problem  with  the  space/ffequency  representation,  we  can  easily  account  for  other  frequency 
phenomena  such  as  focus  and  aliasing  in  the  same  framework.  We  show  how  the  method 
successfully  deals  with  aliasing  in  the  next  section. 
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4.  Results 


4.1.  F\at  Plate 

•how  fout  geomemcaUy  *"  '  normal  ,il,. 

In  Figure  1 '~e  ^  program.  P  slam 

“  -To«:,0.?r*V  '■'  “™  n,,,a,^aivTl>eli-P'aiel>aaas.mple 

t!i  '=  cmaut  nre  Brorlarag  ^ 

mrensiry  ^07°  )tam=  “ 

part  of  the  total  rmageaP 


We  ran  our  algorithm  on  each  of  these  pairs  of  power  spectra,  and  the  results  are  shown  in 
Table  1.  The  method  works  best  on  textures  that  are  closest  to  being  purely  periodic,  like  the 
cosine  and  canvas  textures.  It  loses  some  accuracy  for  textures  with  slightly  more  random 
spacing  like  the  wire  screen  and  straw  cloth.  Considering  that  the  algorithm  is  only  examin¬ 
ing  data  from  about  5.5%  of  the  pixels  on  each  textured  region,  these  results  are  good.  Most 


algorithms  for  shape-from-texture  examine  an  entire  image  of  a  plane  covering  the  whole 
field  of  view. 

texture 

window  size 

computed  (p,  q) 

equivalent  (o,  x) 

error 

cosines 

63x63 

(0.533, 0.4{X)) 

(33.7",  36.9") 

4.0" 

wire  screen 

63x63 

(0.4(K).  0.200) 

(24.1",  26.6") 

11.6" 

cotton  canvas 

63x63 

(0.6(X),  0.333) 

(34.5",  29.1") 

1.4" 

straw  cloth 

63x63 

(0.400,  0.400) 

(29.5",  45.0") 

9.7" 

Table  1:  Results  of  algorithm  on  textured  plates 

texture 

window  size 

computed  (p,q) 

equivalent  (a.  t) 

error 

cosines 

81x81 

(0.600,  0.333) 

(34.5",  29.1") 

1.4" 

wire  screen 

121x121 

(0.577,0.295) 

(32.9",  27.1") 

3.3" 

cotton  canvas 

101x101 

(0.600.  0.333) 

(34.5",  29.1") 

1.4" 

straw  cloth 

121x121 

(0.600,0.333) 

(34.5",  29.1") 

1.4" 

Table  2:  Results  of  algorittiin  on  textured  plates  with  best  window  size 
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We  investigated  the  effect  of  window  size  by  running  our  program  on  the  same  four  textures 
with  different  window  dimensions.  The  results  are  show  in  Figure  8.  The  abscissa  is  the 
length  of  a  side  of  the  square  window  in  pixels.  The  ordinate  shows  the  surface  normal  error 
in  degrees.  For  all  four  textures,  an  undersized  window  causes  inaccuracy.  This  is  probably 
because  the  window  does  not  contain  enough  wavelengths  of  the  texture  to  allow  a  Fourier 
transform  of  adequate  resolution.  For  these  textures,  window  sizes  between  50  and  100 
seem  best.  Beyond  1(X),  the  error  for  the  canvas  texture  increases  sharply.  Although  we 
expect  an  oversize  window  to  degrade  performance  because  of  increasing  non-stationarity. 
the  other  textures  exhibit  this  tendency  only  slighdy,  if  at  all.  Using  the  data  in  these  plots,  if 
we  manually  tailor  the  window  size  to  the  particular  image,  we  get  the  smaller  errors  shown 
in  Table  1.  There  remains  work  to  be  done  on  window  size  considerations.  The  choice  of 
window  size  is  fairly  arbitrary  for  almost  all  shape-from-texture  algorithms  that  require  it. 


iiegrees  Synthetic  Cosi.nes  degrees 


Figure  8:  Angle  error  vs  window  size  for  four  textures  in  Figure  7 
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4.2.  Aliasing 


Although  aliasing  can  cause  real  problems  in  image  understanding,  it  is  rarely  dealt  with 
explicitly  in  machine  vision  algorithms.  Aliasing  occurs  when  the  image  projected  on  the 
sampling  grid  has  spatial  frequencies  that  are  higher  than  half  the  spatial  sampling  rate.  If 
the  aliased  pattern  is  periodic,  moire  patterns  appear.  This  is  shown  in  Figure  9,  which  is 
geometrically  the  same  plate  as  before,  this  time  with  two  different,  higher  frequency 

cosines  painted  on.  The  cosines  run  at  ±45'^  from  the  horizontal.  The  left  side  of  the  plate  is 
not  aliased,  while  the  right  side  is,  because  the  projected  frequencies  have  grown  beyond 
half  the  sampling  rate.  The  series  of  local  power  spectra  across  the  center  of  the  image  show 
what  happens  to  the  frequencies.  As  the  peaks  move  out  from  the  center,  they  approach  the 
edges  of  the  squares.  The  squares’  edges  are  at  half  the  sampling  rate,  and  thus  represent  the 
highest  frequencies  that  can  be  successfully  sampled.  The  peaks  in  the  first  and  third  quad¬ 
rants  hit  the  edges  in  the  fourth  square  from  the  left.  In  the  next  square  to  the  right,  they 
reappear  in  the  second  and  fourth  quadrants  along  with  the  peaks  that  were  already  there. 
This  is  the  onset  of  aliasing.  In  the  last  square  the  aliased  peaks  have  moved  a  little  more 
back  into  the  squtu'e. 

If  the  sampling  rates  in  the  x  and  y  directions  are  and  re.spectively.  then  any  {u.  v) 

±U^  -^v 

outside  the  boundaries  1-:^.  -^)  will  be  aliased.  It  can  be  shown  that  the  aliased  fre¬ 
quency  will  be  given  by 

^^^aUased^  ^aliased^  =  2  J 

where 


saw  fix) 


(It)) 


with  l_,t  J  being  the  “floor"  function,  returning  the  largest  integer  not  exceeding  x  .  The  func¬ 
tion  saw j.(x)  has  a  period  of  T.  We  show  a  plot  of  tis  a  function  of  u  in  Figure  10. 

It  shows  how  the  unalised  frequency  ri.ses  and  then  reappears  at  a  different  frequency  when 
aliasing  (xxurs. 
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Figure  9:  Plate  showing  aliasing  on  the  right 


Figure  10: 


'^aliased 


-—saw 

2  u 


(M) 


Our  algorithm  allows  us  to  account  for  aliasing  very  easily.  When  we  test  a  given  {p,q) , 
we  warp  the  frequency  coordinates  in  one  power  spectrum  by  an  affine  transformation.  We 
simply  put  all  the  transformed  (w,  v)  ’s  through  Equation  (18)  to  adjust  them  for  aliasing. 
This  way,  if  a  given  {p,  q)  causes  frequencies  to  be  transformed  outside  the  half-sampling- 
frequency  limits,  they  will  be  aliased  back  in  at  the  proper  coordinates.  This  is  also  a  conve¬ 
nient  way  of  making  sure  both  frequency  patches  overlap  exactly,  instead  of  having  one 
skewed  off  the  other  with  no  corresponding  frequencies  in  the  other  patch  after  the  affine 
transformation. 

We  ran  our  algorithm  on  the  left  and  right  patches  in  Figure  9  and  got 
(/7,  q)  =  (0.667,  0.467)  with  a  window  size  of  63x63.  This  is  an  error  of  about  4.5*^,  so 
the  method  successfully  accounts  for  aliasing.  There  are  two  restrictions.  First,  it  is  assumed 
that  the  first  patch  is  not  aliased.  Second,  we  cannot  yet  account  for  the  fact  that  aliased  fre¬ 
quencies  actually  sum  with  nonaliased  frequencies.  We  hope  to  remove  this  second  restric¬ 
tion  in  the  future. 

We  know  of  no  other  shape-from-texture  algorithm  that  can  account  for  aliasing  even  in  this 
simple  case.  We  attribute  the  ability  to  the  fact  that  the  space/frequency  representation  pre¬ 
serves  essentially  all  the  data  in  the  original  .signal  and  that  frequency  is  the  natural  domain 
for  the  analysis  of  aliasing. 

5.  Conclusion 

We  have  advocated  the  use  of  the  space/frequency  representation,  which  shows  an  image's 
spatial  and  local  spatial  frequency  characteristics  simultaneously.  One  natural  application 
for  such  a  representation  is  the  shape-from-texture  problem.  If  we  assume  that  the  frontally 
viewed  texture  is  stationary,  we  can  expect  to  see  systematic  changes  in  frequency  from 
point  to  point  due  to  shape  and  perspective  projection.  We  developed  a  new  theory  that  pre¬ 
dicts  the  detailed  behavior  of  spatial  frequencies  in  the  image  of  a  projected  surface. 
Because  it  makes  predictions  at  a  low  level,  this  theory  can  be  applied  to  any  space/fre¬ 
quency  representation  of  an  image.  Using  this  math,  we  developed  an  algorithm  based  on 
the  spectrogram  that  successfully  finds  surface  normals  of  textured  surfaces  by  searching 
through  gradient  space.  The  algorithm  requires  no  feature-finding,  working  instead  on  a 
low-level  representation  that  is  still  convenient  for  analysis.  Because  the  representation  is 
low-ievel.  it  should  suppon  other  kinds  of  image  analysis  as  well.  For  instance,  the  algo¬ 
rithm  can  easily  handle  simple  cases  of  aliasing. 
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